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ATTITUDE DETERMINATION AND 
SENSOR ALIGNMENT VIA WEIGHTED 
LEAST SQUARES AFFINE TRANSFORMATIONS 

Paul B. Davenport 
Goddard Space Flight Center 
Greenbelt, Maryland 


ABSTRACT 

The position and orientation of a vehicle, sensor, actuator, etc., relative 
to a fixed set of axes (another vehicle, sensor, etc.) may be represented mathe- 
matically by an affine transformation (linear transformation plus a translation). 
Symbolically, Z = MX + V, where M is an mxn matrix, V and Z are mxl column 
vectors, and X is an nxl vector. This matrix equation also represents any linear 
relationship between known inputs X and measured outputs Z; e.g., if X is the 
first n powers of a scalar x then each component of Z is a polynomial in x. The 
weighted least squares estimate of M and V is discussed assuming that various 
measurements Z are given (along with the input X). Although there are m(n+l) 
parameters to be estimated, a simple weighting lunction allows a solution by 
inverting only an nxn matrix. This case, including constraints on M (orthogonal, 
rotation, symmetric and skew-symmetric) will be examined in detail. 
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ATTITUDE DETERMINATION AND 
SENSOR ALIGNMENT VIA WEIGHTED 
LEAST SQUARES AFFINE TRANSFORMATIONS 


MOTIVATION 

Many problems of navigation and guidance (sea, air, space) can be stated 
fundamentally as the determination of the orientation and/or position of an axis 
or axes relative to another set of axes (e.g., coordinate axes). The very heart 
of navigation is to determine the attitude and/or position of a vehicle so that 
the vehicle may be headed toward (or held into) a desired position and/or ori- 
entation. To perform these tasks numerous sensors and actuators are usually 
employed. This then requires the knowledge of the position and orientation of 
each device relative to the vehicle axes. Thus, there is the dual problem of 
determining the location of each instrument axes relative to the vehicle axes so 
that the orientation and/or position of the vehicle axes relative to some under- 
lying coordinate axes may be ascertained. 

The first problem (instrument alignment and calibration) is usually per- 
formed once or rather infrequently under laboratory conditions whereas the 
attitude and/or position evaluation is performed in the "field” by the navigator 
(human or computer). For precise missions of long duration the "navigator" 
may be required to perform instrument alignment and calibration calculations 
in addition to the attitude and/or position determinations. Gyro drift, thermal 
bending, stresses, fatigue, etc. may combine to produce unacceptable tolerances. 
In this event, it would be convenient if the navigator’s prior skills (computer 
algorithms) for determining orientation and/or position could be applied to the 
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alignment problem. Here, a mathematical model and techniques will ue pre- 
sented which apply to a large class of alignment and calibration problems, atti- 
tude determination, and position determination. The development is via matrix 
algebra so the results apply to any vector space of arbitrary finite dimension. 

THE MODEL 

Regardless of the instrumentation, physical vector quantities are generally 
being sensed — either directions (star, sun, horizon, etc.) or multiple scalar 
quantities which collectively define a vector (triads of magnetometers, gyros, 
accelerometers, etc.). Thus, the output of many sensors or actuators can be 
represented by mxl matrices (vectors) of components (or direction cosines) in 
an m-dimensional Euclidean space. Such m-tuples will be denoted as Z. Like- 
wise, the components of the physical vector (magnetic field, sun direction, etc.) 
are known relative to some underlying coordinate system. These vectors will 
be denoted by the vector X. The generalized problem of navigation could then 
be stated as the determination of the relationship between X and the measure- 
ment vector Z. For most practical systems this functional relationship can be 
expressed as 

Z = MX + V (1) 

where M is an mxn constant matrix and V is an mxl fixed matrix (vector).* 

X is an nxl vector which may be just the components of the physical vector 
being measured (in which case m = n) or more generally X may be an n-tuple 
whose entries are any known functions of the components, such as powers of 

A ( 

No distinction is made in notation between vectors and matrices except that the latter part of 
the alphabet (starting with T) will be reserved for vectors. 
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the components. As a special case X might be a scalar and M the coefficients 
of a polynomial. 

The above relationship is known as an affine transformation (linear trans- 
formation plus a translation). It includes as special cases; linear transforma- 
tions, V = 0; and translations, M = I. If M is non-singular, then it has the 
geometric interpretation of defining the orientation of one coordinate system 
relative to another. This includes a rotation of axes, reflections, non-orthogonal 
axes ('"hearing), and a different scale factor for each component. In some ap- 
plications additional constraints may be placed upon M and V. For example, 
attitude determination (orientation of rigid body with one point fixed) requires 
M to be a rotation matrix (orthogonal with determinant of plus one) and V be 
zero. Equation (1) also models a system of m single axis devices, each with 
different scale factors, located non-orthogonally from the center of the vehicle. 
Furthermore, there are no restrictions of smallness, i.e., the displacements 
may be large. 

When M, V, and X are given it is trivial to compute Z (the coordinates 
relative to a vehicle, a conglomerate of instruments, etc.) so as to point a 
sensor or actuator as desired. A more pertinent problem to the navigator, 
however, is to determine M and/or V given the local measurements Z (con- 
taining errors) and the vectors X. In this case, (1) can be considered as a sys- 
tem of linear equations containing m(n + 1) unknowns (the elements of M and V). 
Thus, if n + 1 independent vectors X k and the corresponding measurements Z k 
(each with m independent components) are known M and V are determined 
uniquely. This is not generally the case, however; one usually has insufficient 
data or an over-determined system with inconsistent equations due to 
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measurement errors. The classical approach to this dilemma is to seek an 
’’estimate” of the parameters which is best in some sense. Here, the estimate 
which minimizes the weighted sums of squares of the vector norm will be dis- 
cussed, i.e., the minimization of 


f(M.V) = £ [Zk - (M\ + V)] T P k [Z„ - (M\ + V)] 


where the P k are positive definite symmetric weight matrices (e.g., the variance- 
covariance of Z k ) indicating the relative accuracy of Z k . The different Z k may 
represent measurements from different types of instruments or readings from 
the same instrument at different times. The summation is taken over all such 
measurements f . Expanding (2), bearing in mind that P T = P , and collecting 
terms gives: 


f(M,V) = £ " 2Z k T P k MX k + (M\) T P k M\ 


+ V T p k V - 2Zj P k V + 2 V t P k M\] . 


The summation indices are hereafter omitted for convenience. Unless otherwise 
stated the summation ranges over all measurements. 


THE CONDITION EQUATIONS 


The necessary conditions that f(M,V) have a minimum are 


3 m.. 

i j 


= 0 , 


= 0 , 
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for all i and j (i = 1, 2 m; j = 1, 2 n). Formally, this results Into 

m(n + 1) linear scalar equations in the m(n + 1) unknowns (m t J and v*). A more 
elegant and informative procedure is to retain the matrix notation and express 
the resulting conditions as matrix equations rather than a large number of scalar 
equations. To facilitate this, the following definition is made: If h is a scalar 
function of an mxn matrix Q with elements q i j then the ’'gradient'’ of h(Q), de- 
noted as Vh(Q) or simply Vh, is the mxn matrix with elements (Vh) | j given by 


( Vh >ii = J 


a h 


q ii 


The gradient of several elementary scalar functiors (which are adequate for the 
present discussion) are given below: 

VhOJ) = W, if h(U) = W T U = U T W, W and U nxl matrices 
7h(Y) = (P T + P)Y, if h(Y) = Y T PY, P mxm, Y mxl 
Vh(N) = YU T , if h(N) = Y T NU, N mxn 
Vh(N) = (P T + P)NUU T , if h(N) = (NU) T PNU. 

These identities follow directly from the definition and the rules of matrix 
multiplication. For a function of several matrices, e.g., h(P,Q) the notation 
V(P)h denotes the gradient of h with respect to P only. In other words, V (P) 
operates on h as if h were a function of P alone. 

In terms of the above definitions, the condition Equations (4) can be written 


as 


V (M) f = 0, 


V(V)f = 0. 


Performing the indicated operators on (3) and equating to zero yields the fol- 
lowing simultaneous matrix equations: 
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L> k M\\ T - 1] p k Zy \ T ♦ Yi P k V Xj = 0 
1! p k V - £ p k z, * £ P k M \ = 0 . 


( 5 ) 


The Equations (5) still represent m(n + 1) linear equations, but offer a 
notational advantage over those implied by (4) in that M and V appear explicitly 
rather than their components. Apparently, generalized techniques for solving 
such systems are non-existent except to re-write the equations as a single 
matrix equation of the form CY = W where Y and W are column vectors of di- 
mension m(n +1). A convenient notation exists for accomplishing this, but the 
numerical solution may present a prodigious amount of computation even for 
relatively small n and m. 

This last representation of the problem (CY * W) could have also been ob- 
tained by using classical linear estimation results rather than the above approach. 
(1) can be re-written for each observation as Z k = C k Y, where the matrix C k is 
a function of X k and Y is a column vector composed of the unknowns m. ) and v. 
in some order (e.g., the columns of M plus V concatenated). If the classical 
least squares conditions are applied to the matrix equation representing all such 
observations (assuming each vector observation is independent) then one obtains 
the same matrix equation as that derived from re-writing (5) in the form CY = W. 
It is not the intent here to do this explicitly, but to appeal to well-known least 
squares results to insure that a solution to (5) exists. The development leading 
to (5) was selected since it gives more insight into the nature of the solution and 
is more amenable to constraints which will be considered later. 
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THE CASE OF AN EXPLICIT MATRIX SOLUTION 


The complexity of (5) can be attributed to the generalized statistical model 
rather than the geometrical model. ,1 the weight matrices are of the form 
P k = p k P for all k (P a fixed positive definite matrix and p k a positive scalar) 
then (5) may be solved explicitly in a closed simple form. Under these condi- 
tions (5) reduces to: 


PMA 0 tPVXj = PB 0 


(6) 


PMX 0 + s P V = PZ 0 , 


where 


\ = E PAV' B 0 = £ Pk Z - V • 

*0 " Pk ^ ' Z ° ~ Pic 1 

and 

s = p k • 

Since P was defined to be non-singular (6) is equivalent to 

M A = B , 

V = I ( Z o - MXo) • 


where 



( 7 ) 


( 8 ) 
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Hence, if P. = p k P, the questions of existence, uniqueness, and the solution 
itself depend only on the simple matrix equation MA = B where A is a constant 
nxn symmetric matrix and B is a fixed mxn matrix. 

If A is non-singular, then *8) has a unique solution 

M = BA * 1 

V = 1 (Z 0 - MV 

which minimizes (2) (the sufficiency of (8) for a minimum follows from the 
nature of the function and linear least squares theory). The least squares trans 
laticn is given by M = I (the identity matrix), V~- Z 0 ; whereas the least squares 
linear transformation is M = B 0 A' 1 and V = 0. 


SEQUENTIAL SOLUTIONS 

A familiar identity from recursive least squares 


with 

£ = W T C* 1 U - a , 


W T 


- 1 


= c* 1 (i - bu w* r i ) , 


(9) 


may be employed to examine three different models (full affine, translation only, 
and line r transformation only) with a single matrix inversion. If V' denotes 
the vector of the least squares translation, L the matrix of the least squares 
linear transformation, M and V the matrix and vector of the least squares affine 
transformation, then 


V’ 
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L = B 0 A-‘. 

I = X 0 T A-'X 0 - s, 

A * 1 = A 0 ->(I - rXoXjA-') , 

M = BA"’, 

= L - r(LX 0 - Z 0 )X 0 T A 0 > , 

V = j(Z 0 - MX,) . 

The above formulas imply the existence of A ~ 1 under the earlier assumption 
that A was non-singular. This is indeed the case, but the converse (A* 1 exists 
if A~ 1 exists) is not true if Xj A 0 1 X Q = s. 

The identity (9) may also be applied to a recursive solution of (8). If a 
superscript is added to each of the intermediate quantities in (7) to denote the 
summation limit, e.g., 

K = L 


a? = Aj-» ♦ P(Xi Xf T , Bg = B^-’ + p e Z f Xp T , 

xl = X^-> + p, X, , z| = zJ-‘ + PfZf , 


Therefore, 



1 - 1 


+ P? 
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<A$)-‘ = (Aj* 1 )* 1 [l - qX, X f T (Af->)-'] • 

where 


q = X^(A X F ♦ 1/pp , 
provided (Aj, -1 )’ 1 exists. 

A SINGULAR CASE 

When ti.e matrix A is singular, then there is insufficient data to define the 
model uniquely. For some applications, however, any solution which fits the 
data might be adequate. Since A is symmetric, there exists an orthogonal 
matrix Q such that A = Q DQ" 1 where D is a diagonal matrix with entries d j 
(j = 1, 2, . . ., n). Eq. (8) can then be written as M' D = B' with M' = MQ and 
B' = BQ. Let r denote the number of non-zero elements of D and assume that 
Q and D are such that these non-zero elements occupy the first r columns (rows) 
of D. The above equations then give 

m?. - j - b. . , (i - 1, 2, . . . , m) 

j 

(j = 1. 2 r) 

with m ; ] arbitrary for j = r + 1, ...,n. The consistency of the solution can be 
justified by appealing to classical linear least squares estimation theory. To be 

consistent requires b! . = 0(j =r + 1 n ). be., the last n-r columns of B' 

are zero. 

Let D* be a diagonal matrix with entries d.* = l/d j (j = 1, 2, . . ., r) and 
dj + = 0 for j = r + 1, . . ., n. The above solution can then be expressed as: 
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M Q = M' = B' D + + N , 

where N is an mxn matrix whose first r columns are zero and the last n - r 
columns are arbitrary vectors. Solving for M yields 

M = B Q D* Q- 1 + N Q- 1 

= B A ' + N O' 1 

where A* = QD + Q _1 is the pseudo-inverse of A (9) . Denoting the non-zero 
columns of N as N rM , . . N n and the corresponding columns of Q (the eigen- 
vectors associated with the zero eigenvalues of A) as Q rM , . . ., Q n , then 
NQ' 1 = 2 Nj Q } T (summed from r + 1 to n). 

Analogous to the least squares vector of least norm, one may obtain the 
unique least squares matrix of least norm (norm of M defined as tr (M T M) = 
tr (M M T ), tr denotes ’’trace of”) by letting N be the null matrix. On the other 
hand, since M may represent a linear transformation, a solution which is 
’’closest” to the identity transformation may be desirable, i.e., minimizes the 
norm of I - M when M is square. This solution is obtained by setting = Q . 

(j = r + 1, ...,n). Both of these special solutions are easily verified by using 
the well-known properties of the pseudo-inverse and trace fu Jon. 

The vector V is always given by the second equation of (8) whether A is 
non-singular or not. Thus, in the singular case 

V = ; [z 0 - BA*X„ - £ (Q, T *,)*»,] 

(summation from r + 1 to n). If Q T X Q is non-zero for some j, then the arbi- 
trariness of N may be used to eliminate V instead of imposing the above con- 
ditions on M. 
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THE GENERAL SOLUTION 

Thus far, the solution to the simultaneous matrix equations (5) , which 
provides the minimum of (2) , has been exhibited under all conditions for the 
special weighting P k = p k P. If each vector measurement Z k is statistically in- 
dependent and the variance-covariance matrices of each Z k differ only by a 
multiplicative constant then these solutions provide the minimum variance solu- 
tions. This is the situation when all vector measurements relate to the same 
type of instrument (whose variance-covariance matrix varies only by a scalar) 
or when each vector measurement is composed of m independent scalar measure- 
ments with the same variance. In the general case, one might forsake a mini- 
mum variance requirement in order to obtain a simple solution by assigning a 
single weight to each vector. In many instances this may be an adequate solution 
or serve as an initial approximation for an iterate scheme. 

As noted previously, the computational complexity of the problem soars 
when the general weight is considered. The dimension of the matrix to be in- 
verted is increased by a factor of m which enlarges the computations by a factor 
of order m 3 . 

Should accuracy considerations dictate the additional effort, however, there 
is available notation for deriving the larger system of linear equations in matrix 
form. This is via the direct product (also tensor or Krcnecker product) of 
matrices. If A is an mxn matrix and B a m'xn' matrix, the direct product of A 
and B, in that order, is defined by the partitioned matrix 
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n 1 1 ® a 1 2 ® 8 1 n ® 


A© B = 


a 21 ® n 2 2 ® a 2n B 


a . B a a , B ... a B / 

A © B is an (mm )x(nn ) matrix. For properties of A(©B when A and 0 are 
square see | 7) or | 8) . For an mxn matrix C, let C denote the (mn)xl column 
vector whose components are the elements of C ordered by rows ( c . = c , 

k - J «• (i-l)n; i®1.2 m; j * 1, 2 n). For an mxl matrix (vector) 

V. V V. With these definitions, it is straightforward to verify that if C = ANB 
then 

C = ( A © B T ) N . 

In terms of the above notation, the conditions (5) may be expressed as: 


F M + G V = B 0 . 


G T M + P V = Y . 


where 


r = £ P k ©\\ T . c = £ p w© V 


( 10 ) 


B 


0 \ * Z 0 ^ ^k ^k * 


and 


P = 


L 


p, . 
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Lancaster |6| discusses this notation as well as the direct solution of matrix 
equations similar to that of (5) for V - 0 

Since P is non-singular (the sum of positive definite matrices) 

V = P 1 (Z 0 - G t M), 

and M must satisfy 

(F - GP ” 1 G T ) M - B 0 - G P ' 1 Z 0 . 

The form of these last two equations is similar to that ol the simple weight 
case (H) . hut the order of the coefficient ma.rix may be considerably higher. 

As with the simple weight case, the translation, linear transformation, and 
lull affine models are separable by a generalization of (9). Explicitly. 

(F - G P“ 1 G T )“ 1 = F ~ 1 (I - GKG t F 1 ) , 

with 

P + K 1 = G T F 1 G, 

provided the indicated inverses exist. 

CONSTRAINTS 

In some applications one may have a priori knowledge about the nature of M 
and desires to restrict or constraint M to be a particular type of matrix. One 
important application is where M is known to be the matrix of a rotation (ortho- 
gonal with determinant of + 1) which defines the orientation of a rigid body. 

Other types of special matrices which will be discussed are: orthogonal 
(M t M - I), symmetric (M T -M = 0), and skew-symmetric (M T + M = 0). 
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The definitions of the special matrices above are all expressable as a 
number of scalar equations of the type g, ] (M) = k , ) (k , ] constant). Thus, all 
of the constraints may be handled by the method of Lagrange. This requires 
the minimization of 

h(M.V) = f(M.V) ♦ g (M) , 
where f(M, V) is as in (2) and 

8(M) = £ x ij g .i 

(summed over all scalar constraint equations). The are the Lagrangean 
multipliers to be determined and as the notation implies are considered as ele- 
ments of an unknown matrix A. 

The necessary conditions for the constrainted minimum in terms of the 
gradient defined earlier are now 

V (M) h = V (M) f + V (M) g = 0 , 

V (V) h = V(V)f = 0. 

The results of performing the operator are then the same as (5) with the terms 
associated with V(M)g added to the first equation. 

The gradient operator can be readily applied to the function g(M) for each 
of the special matrices being considered. The results are: 

Symmetric: g^ = Vg = -2A, 

where 

A t = - A ; 
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Skew-symmetric: g tJ 

where 


m j i f m i j • v K = 2 A • 


A t 


A 


Orthogonal: 


K. 


n 

z; 

k = 1 


m, 


m 


k J 


Vg = 2 M A , 


where 


A T = A . 


The rotation matrix is the same as the orthogonal case with one additional scalai 
constraini, namely d(M) = 1 (d(M) denotes determinant of M). Since orthogonal 
solutions will include the rotations, the rotation case will be considered as a 
special case of the orthogonal one rather than by adding another Lagrange 
multiplier. 


ORTHOGONAL AND ROTATIONAL SOLUTIONS 


The resulting matrix equations for the orthogonal case are: 

Z>k M ^ v + L p k v v + = v 

jy, M \ + r> v = z 0 . 

M T M = I , 

A T = A. 


ni) 


where B 0 , Z 0 , and P are defined by (10). M and A are square matrices to be 
determined so that all four matrix equations in (11) are satisfied. 

The Equations (11) are quite complex and owing to the non-linearity of two 
of the equations cannot be handled by any of the techniques used heretofore. 
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Fortunately, an explicit solution does exist for a simple weighting which allows 
insight into the geometry of the problem as well as the effects of weighting. 

When a single weight is associated to each vector, i.e., P k = p k I (p k positive 
and I the identity matrix) then the first two equations of (11) can be written as: 

M(A 0 ♦ A) ♦ VX 0 T = B 0 , 

MX 0 + sV = Z 0 , 

with A 0 , X 0 , and s given by (7). Solving the second equation for V and substi- 
tuting this into the first equation yields 

M(A ♦ A) = B, (12) 

where A and B are defined as they were in (8). From (12) one deduces that 

B T B = (A ♦ A) M T M( A + A) = (A + A) 2 

since A + A is symmetric and M T M = I. Letting H = (A + A) gives 

H 2 = B t B, 

(13) 

MH = B. 

These last two equations show the dependence of the solution on the square 
root of the matrix B T B. If H is a non-singular symmetric matrix such that 
H 2 - B t B, then A = H - A is symmetric and 

M = BIT 1 (14) 

is orthogonal for M T M = H* 1 B T BH' 1 = H~ 2 H" 1 = I. Thus, the M and A just 
defined afford solutions to the Equations (11) when P k = p k I for all k. 
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The conditions imposed on II thus fur are not enough to insure uniqueness, 
e.g , -H is also a non-singular symmetric square root of B T B if H is. Further- 
more, the condition equations in (11) only assure that the extremal values of f 
are a subset of all solutions. In order to establish the true minimum, the effects 
of the solutions on the value of f must be examined. 

If U is any column vector then U T U = tr(UU T ). From this, it is straightfor- 
ward to show that 

f (M, V) = tr(C) + tr(MAM T ) - 2tr (MB T ) 

where C is a constant matrix independent of M. Since tr(MAM T ) = tr(M T MA) = 
tr(A) when M T M = I, the above expression may be written as: 

f (M. V) = tr(A + C) - 2tr(MB T ). 

Hence, the mininv m of f is provided by the maximum of tr(MB T ) which is equal 
to tr(H) for Mil = B. This establishes that the desired minimum is obtained 
when H is the symmetric square root of B T B with largest trace. 

Now’ it is well-known that for an arbitrary square matrix B that B T B is 
positive semidefinite. In fact, a classical result of matrix algebra states that 
any square matrix B can be factored as in (13) with M orthogonal and H a unique 
positive semidefinite matrix (polar decomposition, see [5) or (8)). From the 
discussion above, it is then clear that this choice of H provides the desired 
minimum of the function f for the orthogonal case. If B is singular, however, 
then any square root of B T B is also singular and the solution (14) is invalid (the 
orthogonal part of the polar decomposition is not unique). The construction of 
the solution for the singular case actually constitutes a proof of the polar 
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decomposition theorem. However, since tte theorem does not cover the rota- 
tional case even when B is non-singular, a construction which includes both the 
singular and rotational cases is jlvefi below. 


Since B t B is symmetric and positive semidefinite. there ip an orthogonal 
matrix N such that N“ l B T BN * D' where D' is a diagonal matrix with non- 
negative entries d j , (i = 1, 2 m; m the dimension of B). The ith column 

of N (denoted as NJ is a unit eigenvector of B T B corresponding to the eigenvalue 
dj i . Let D be a diagonal matrix such that D 2 = D (d 2 , * d' t ,). then the above 
matrix equation implies 


(BN i ) T BN j = d^ = 0, for i 7 j 

and 

IbnJ 1 = d* . 

Hence, a complete set of orthonormal vectors Q 4 (i = 1, 2 m) can be con- 

structed so that BN 4 d, ,Q 4 . Let Q denote the orthogonal matrix obtained by 
juxtoposing the column vectors Q i in proper order so that BN = QD, thus 
B=QDN“ l . Now, set 


M = QN* 1 . 

H = N D N' 1 , 


(15) 


and wdth the definitions above it is easy to verify that M and H satisfy the Equa- 
tions (13).* Note also that Q' ! BB T Q = D 2 = D\ hence, the d' l4 are also eigenvalues 
of BB r with eigenvectors Q i . 


* 


Another polar decomposition is B = SM with S = QDQ 


-1 
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The trace of H. as given in (15), is just the sum of the entries d, . of D 
which is a square root of D'. Therefore, the orthogonal least squares solution 
is obtained by taking all positive square roots in the definition of D. If the re- 
sulting M also has a positive determinant then M is also the rotational least 
squares solution. This will be the case when d(B) - d(D)d(M) > 0. If d(B) < 0 
then the least squares rotation is obtained by changing the sign of the smallest 
d, ( in the definition of D, i.e., all entries of D are positive except the one with 
least absolute value. When d(B) : 0 (B singular) the constructed M (using non- 
negative d, k ) may or may not have a positive determinant. Should the determi- 
nant of M be negative then changing the sign of any vector Q t (the ith columi. 
of Q) corresponding to a zero d 4 1 will change the sign of d(M) without changing 
the value of tr(H) = tr(D). The orthogonal solution is unique provided d(B) ^ 0. 
The rotational solution is unique unless the smallest eigenvalue is a multiple 
root and d(B) < 0. 

There are many interesting interpretations of the matrix M just constructed. 
It is the orthogonal or rotation matrix which: (1) Minimizes Equation (3) for the 
assumed weighting, (2) Maximizes tr(MB T ) thus minimizes the norm of B-M, 

(3) Provides a polar decomposition of B, and (4) Is a solution of the matrix 
equation MB T - (MB T ) T = 0. This last equation stems from the decomposition 
B = SM, its significance will be discussed later. It is also noteworthy that the 
orthogonal or rotational M is completely independent of the matrix A whereas 
the unconstrainted solution was highly dependent upon A. 

ATTITUDE DETERMINATION 

Since the advent of the space age a wide variety of problems have been 
pursued within the label of attitude determination. These include determination 
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of: (1) The orientation of all three coordinate axes of a stabilized vehicle, 

(2) The direction and/or rate of the spin axis of a spin-stabilized vehicle, and 

(3) Either of the above as a function of time. Each of these problems has been 
solved for a wide range of sensors. In addition, many parametrizations 

of the rotation group have been employed as the independent parameters to be 
determined. The weighted least squares rotation matrix of the previous section 
provides a solution to many of these problems without re-formulating equations 
for each new sensor. It is applicable to any sensor whose output can be formed 
into a vector of components relative to the coordinate system whose ’’attitude” 
is to be determined. The desired parameters can then be obtained from the 
determined matrix. 

If the model assumes that the attitude is being held by inertial sensors 
then readings at different times may be combined. This technique has been 
used successfully for the Orbiting Astronomical Observatory (2) to the extent 
of determining attitude from magnetometer data only while in darkness (after 
the magnetometers were aligned by techniques herein). When the attitude is a 
function of time and M t is the least squares attitude matrix at time t } and M 
the least squares matrix at t 2 then AM = M 2 M defines the spin axis and angle 
of rotation during the interval 1 2 -t v If the spin axis and rate are considered 
constant and the data taken at equal time intervals; a least squares estimate of 
the rate matrix may be obtained by denoting the measured vectors at t t as X’s 
and those taken at t i + 1 as Z’s. 

The usual statement of the attitude problem is without the translation since 
position and orientation data are provided by different sensors. For this case, 
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the second equation of (11) is omitted and V set to zero. This still leads to 
equations of the form (13), but with a simplified B matrix, i.e., 

B = B 0 = £ P k Z k V- 

Since p k is assumed positive, let p k - q k 2 ; the matrix B may then be written as 

n = V u v T 

u L-* k v k ’ 

with V k = q k X k and W k = q k Z k . With this notation 

tr(MB T ) = tr (£ M V„ W * ) = £ W * M V k = £ W„ • (M,V k ) (16) 

As was mentioned earlier, every M satisfying (13) is a solution of the matrix 
equation 


MB T - (MB t ) t = 0. (17) 

Thus, the desired solution is among the solutions of (17). For the present case 
(B = B Q ) this yields 

2]M v k ) W k T - \ (M v k ) T = 0 . (18) 

The left-hand side of the above equation is a skew-symmetric matrix, thus 
represents only n(n- l)/2 independent scalar equations. In three-space, such a 
matrix is isomorphic to a 3x1 vector. Let 0 denote the skew-symmetric matrix 
formed from the vector U such that for any vector V, UV = U * V. Then the 
independent scalar equations given by (18) can be expressed as: 

L w k *( MV k > = °- < 19 > 
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In summary, the desired weighted least squares attitude matrix is that rotation 
matrix which satisfies (19) and maximizes (16). 

Expressing the condition equations in terms of the cross and dot product 
provides a physical or geometric interpretation of the solution which leads to 
some interesting observations. The simplest case is when only one measured 
vector is available. In this case the answer is not unique, but a rotation about 
the line W t x V, by the angle between W, and Vj provides the ’’shortest” ro- 
tation | 2) . Note that an error in the length of Z j does not effect the ans /er. 

A more practical situation and perhaps the one that has received the most 
attention is when two measurements are given. Equation (19) requires that the 
plane defined by Vj and V 2 (plane I) be rotated into the plane defined by Wj and 
W 2 (plane II) such that 

lw, X (MV,)| = | (MVj) xWjl . 

This last condition requires the area of the triangle formed by W lt MV lf and 
W j - MVj (triangle I) to be equal to the area of the triangle formed by W 2 , MV 2 , 
and W 2 -MV 2 (triangle II). Since M preserves length ( |MV | = |V i ) this require- 
ment may be stated as: 

I Wj I I V , | sin 0 j = |W,| |v 2 | sin 0 2 , 

or 

pj |zj |Xj sin (9 j = p 2 | Z 2 1 |X 2 | sin 6 2 , (20) 

where 0 is the positive angle between MX^ and Z ., i = I, 2. 
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The requirement that plane I be rotated into plane II may be satisfied as 
follows: Let N 1 and N 2 be orthogonal vectors of unit length in plane I and 
N 3 = N, x N 2 . Similarly, let Qj and Q 2 be orthogonal unit vectors in plane II 

and Q 3 = Q { x Q r N = (N Jt N 2 , N 3 ) and Q = (Qj. Q 2 . Q 3 ) are then rotation ma- 

trices and the matrix M such that MN = Q or M = QN" 1 is a rotation matrix 
which indeed satisfies the requirement. Note that this form of M is precisely 
that of the polar decomposition solution given by (15). It remains to define N lt 
N 2 , Q , and Q 2 explicitly so that all other conditions are satisfied. 

It is geometrically obvious and a routine matter to show that when 
| V j | = |V 2 | and |W | = | W 2 ! the correct solution is given by: 

n; = Vj-Vj.Nj = v, + v 2 . n 4 = n; /In;| , (i = l, 2> 

q; = w, - w a . Qj = w 1 + Wj,q, = q;/Iq;|. 

This suggests a general solution of the form N| = xV t - V 2 and N 2 = yV t + V 2 
where x and y are scalars of proportionally depending on the relative lengths 
of the given vectors. Indeed, it can be shown that, if 

Vj • Vjxy + Vj • V 2 (x - y) - V 2 • V 2 = 0, 

and (21) 

W 2 • W 2 xy - Wj • W 2 (x - y) - W t • Wj = 0, 

then Nj and N 2 are eigenvectors of B T B (polar decomposition). The first 
equation of (21) is just the requirement that and N' 2 be orthogonal. The 
vectors BNJ and BN' 2 are proportional toQ’ 1 =W 1 -yW 2 and Q' 2 = Wj + xW 2 
respectively, i.e., Q \ and Q 2 are eigenvectors of B B T and the second equation 
of (21) is the condition for their orthogonality. 
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Solving the simultaneous Equations (21) yields 

x - y = a/c , 

xy = b/c, 

with 

a = (V 2 -V 2 )(W 2 -W 2 )-(V 1 -V 1 )(W 1 -W l ), b = (W 1 -W l )(V 1 -V 2 )MV 2 -V 2 )(W 1 -W 2 ), 

C = ( V 1* VjXWj* w 2 ) + (W 2 -W 2 )(V l *V 2 ). 

These last two equations have two solutions, given by 

a ± l/a 2 + 4bc - a ± /a 2 + 4bc 

x = 2c ’ y = 2c * 

where the signs of the radicals must be consistent. Both solutions will also 
satisfy (19) if 


n; 

xV t - V 2 

. Nj = 

yV, + v 2 , n, 

= n;/|n;| , (i = i, 2) 

q; 

= w, - yw 2 

• q; = 

w, + xw 2 , Qi 

= q;/Iq;I. 

N 3 

= N,xN 2 , 

= 

Q,xQ 2 , 


N 

= (N,, N 2 , 

N 3 ). 

Q = (Q r Q 2 . 

q 3 ). 


and 

M = Q N’ 1 . 

The solution that maximizes (16) may be ascertained by expressing V . V 2 , W p 
and W 2 in terms of N [ , N 2 , Q[ , and Q 2 . When this is done, it is found that 
x and y must also satisfy the condition x + y > 0. The sign of the radical in the 
definitions of x and y can be selected so that this condition is met. This, then, 


25 


28 

completes the solution when two measured vectors are given. This solution 
appears in 12) without proof and Fraiture (4) offers a different construction. 

Perhaps, the most important aspect of the two measurement solution is the 
insight it provides to the effects of weighting. The length of each of the vectors 
Wj and W 2 obviously affects the solution, and their lengths are a function of the 
weights as well as the lengths of the measured vectors Z x and Z y Therefore, 
an error in the length of either Z l or Z 2 has the same effect as a weight factor 
and biases the solution. This implies that the length of the measurement Z x 
should be made equal to that of X A . One arrives at the same conclusion by 
arguing that, since the rotation cannot change the length, any deviation in length 
is due to measurement noise (provided any misalignment has been eliminated) 
and should be removed. Likewise, the lengths of X { and X 2 may bias the solution 
if they differ. This is particularly true when a sensor measuring only direction 
is combined with one measuring length as well. Thus, it is concluded, that to 
obtain an unbias attitude {rotation) matrix all data should be normalized so that 
the only length appearing in the V’s and W's is that due to the weighting factors. 
This same conclusion is reached in [2] , but by a completely different argument. 

When more than two measurements are given the polar decomposition of 
the previous section provides a solution. The solutions in [31 also depend upon 
the polar decomposition in slightly different form. [1 ) offers two solutions 
quite different in nature. The polar decomposition solution indicates that care 
should be taxen when employing an iterate or differential correction type of 
solution. If the matrix B is non-singular then there are four solutions to the 
condition equations, all satisfying the constraints: assume d(B) > 0, then the 
desired solution is with all three positive square roots, however, any combination 
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of two negative and one positive entries in D also yields a rotation matrix. A 
similar argument exists for d(B) < 0. These spurious solutions could cause false 
convergence with a poor initial estimate. 

SYMMETRIC AND SKEW-SYMMETRIC SOLUTIONS 

The constrainted equations for the symmetric and skew-symmetric cases 
are very similar, differing only by a plus or minus sign. Because of this, the 
solutions are also similar and will be treated together. The resulting equa- 
tions are still linear in M and A; thus, the general case can be handled by re- 
constructing the independent scalar equations into a single matrix equation. 

As with the previous cases discussed, the weighting P k = p k I offers a sim- 
plified solution. In this case, the equations to be solved are: 

M A = B ± A, 

M t = ± M , 

A t = T A . 

The upper signs pertain to the symmetric case, whereas the lower signs denote 
the skew-symmetric case. The matrices A and B are as previously defined. 

Once M has been determined, V is given as in (8). 

From the above equations, one deduces that 

B±B t = MA±(MA) t = MA+AM. (22) 

rince A is symmetric, there exists an orthogonal matrix Q such that Q" 1 AQ = D 
is a diagonal matrix. Equation (22) is then equivalent to: 

M'D + DM' = Q" 1 (B ± B t )Q 
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with M' = Q^MQ. The component equations are 

(d ii ♦ d n ) mjj = s ijt (i f j = 1,2 n) 

where S = Q* 1 (13 ± B T ) Q is symmetric or skew-symmetric depending on the 
case being considered. Therefore, M' is symmetric or skew-symmetric re- 
spectively and it follows that M = Q M' Q _1 is also. 
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